This project conducts a thorough exploration and analysis of COVID-19 data related to cases and deaths across Brazil, covering the period from early 2020 to early 2022. The analysis solely utilizes R for data preprocessing, manipulation, and visualisation, aiming to identify key patterns and insights into the pandemic’s impact across different Brazilian regions.
The dataset, sourced from an official Brazilian database, provides detailed records from epidemiological reports across the country.
The initial step involves cleaning and preparing the data for analysis using R, focusing on selecting necessary variables and correcting data discrepancies.
Deep dive into the data using statistical techniques and visualisations to uncover temporal and regional trends in case and death rates.
The findings underscore the importance of tailored public health responses and provide valuable insights for policymakers and health professionals.
#install.packages('remotes')
#remotes::install_github('liibre/coronabr')
library(coronabr)
library(data.table)
library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)
library(ggpubr)
In this project, a dataset, from an official Brazilian source, was used to explore the Covid19 evolution for a period of 2 years. In this first part, the data was pre-processed and in the next section the data was further explored in order to find out insights.
The dataset was loaded using the package ‘coronabr’.The first rows of the data sre displayed below.data <- get_corona_br(save = FALSE)
head(data)
## # A tibble: 6 × 18
## city city_ibge_code date epidemiological_week estimated_population
## <chr> <dbl> <date> <dbl> <dbl>
## 1 Rio Branco 1200401 2020-03-17 202012 413418
## 2 <NA> 12 2020-03-17 202012 894470
## 3 Rio Branco 1200401 2020-03-18 202012 413418
## 4 <NA> 12 2020-03-18 202012 894470
## 5 Rio Branco 1200401 2020-03-19 202012 413418
## 6 <NA> 12 2020-03-19 202012 894470
## # ℹ 13 more variables: estimated_population_2019 <dbl>, is_last <lgl>,
## # is_repeated <lgl>, last_available_confirmed <dbl>,
## # last_available_confirmed_per_100k_inhabitants <dbl>,
## # last_available_date <date>, last_available_death_rate <dbl>,
## # last_available_deaths <dbl>, order_for_place <dbl>, place_type <chr>,
## # state <fct>, new_confirmed <dbl>, new_deaths <dbl>
As we can see above, there are a lot of variables for this dataset. As only a few of these variables were needed for this project, which included ‘date’, ‘state’, ‘estimated_population’, ‘new_confirmed’ and ‘new_deaths’, these were selected next. The variable ‘city’ was also included because according to the dictionary of variables, the NA values for this variable represents state data, and therefore, this variable helped to perform some filtering of the data. Later on, this variable was also removed.
However, before selecting the variables of interest, a copy of the original dataset was created so that if it’s necessary to recover any information from it later on, it’s not needed to load the dataset again.original_data <- copy(data)
data <- data %>%
select(date, city, state, estimated_population, new_confirmed, new_deaths)
head(data)
## # A tibble: 6 × 6
## date city state estimated_population new_confirmed new_deaths
## <date> <chr> <fct> <dbl> <dbl> <dbl>
## 1 2020-03-17 Rio Branco AC 413418 3 0
## 2 2020-03-17 <NA> AC 894470 3 0
## 3 2020-03-18 Rio Branco AC 413418 0 0
## 4 2020-03-18 <NA> AC 894470 0 0
## 5 2020-03-19 Rio Branco AC 413418 1 0
## 6 2020-03-19 <NA> AC 894470 1 0
data <- data[is.na(data$city),]
data <- select(data, -city)
head(data)
## # A tibble: 6 × 5
## date state estimated_population new_confirmed new_deaths
## <date> <fct> <dbl> <dbl> <dbl>
## 1 2020-03-17 AC 894470 3 0
## 2 2020-03-18 AC 894470 0 0
## 3 2020-03-19 AC 894470 1 0
## 4 2020-03-20 AC 894470 3 0
## 5 2020-03-21 AC 894470 4 0
## 6 2020-03-22 AC 894470 0 0
dim(data)
## [1] 20119 5
glimpse(data)
## Rows: 20,119
## Columns: 5
## $ date <date> 2020-03-17, 2020-03-18, 2020-03-19, 2020-03-20, …
## $ state <fct> AC, AC, AC, AC, AC, AC, AC, AC, AC, AC, AC, AC, A…
## $ estimated_population <dbl> 894470, 894470, 894470, 894470, 894470, 894470, 8…
## $ new_confirmed <dbl> 3, 0, 1, 3, 4, 0, 6, 4, 2, 0, 2, 0, 9, 7, 1, 1, 2…
## $ new_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
lapply(data, function(x) { sum(is.na(x)) })
## $date
## [1] 0
##
## $state
## [1] 0
##
## $estimated_population
## [1] 0
##
## $new_confirmed
## [1] 0
##
## $new_deaths
## [1] 0
data <- data %>%
mutate(month = month(date)) %>%
reframe(date, month, state, estimated_population, new_confirmed, new_deaths)
data %>%
select(date) %>%
summarise(
min_date = min(date),
max_date = max(date)
)
## # A tibble: 1 × 2
## min_date max_date
## <date> <date>
## 1 2020-02-25 2022-03-27
data <- data[data$date >= date(ymd('2020-03-28')),]
data %>%
select(state) %>%
unique()
## # A tibble: 27 × 1
## state
## <fct>
## 1 AC
## 2 AL
## 3 AM
## 4 AP
## 5 BA
## 6 CE
## 7 DF
## 8 ES
## 9 GO
## 10 MA
## # ℹ 17 more rows
data %>%
select(state, estimated_population) %>%
unique()
## # A tibble: 27 × 2
## state estimated_population
## <fct> <dbl>
## 1 AC 894470
## 2 AL 3351543
## 3 AM 4207714
## 4 AP 861773
## 5 BA 14930634
## 6 CE 9187103
## 7 DF 3055149
## 8 ES 4064052
## 9 GO 7113540
## 10 MA 7114598
## # ℹ 17 more rows
data %>%
filter(new_confirmed<0) %>%
select(new_confirmed)
## # A tibble: 23 × 1
## new_confirmed
## <dbl>
## 1 -17
## 2 -23
## 3 -2845
## 4 -12028
## 5 -72
## 6 -9
## 7 -507
## 8 -25
## 9 -8246
## 10 -19
## # ℹ 13 more rows
data %>%
filter(new_deaths<0) %>%
select(new_deaths)
## # A tibble: 34 × 1
## new_deaths
## <dbl>
## 1 -1
## 2 -3
## 3 -2
## 4 -2
## 5 -6
## 6 -1
## 7 -2
## 8 -1
## 9 -6
## 10 -1
## # ℹ 24 more rows
for (i in 2:length(data$new_confirmed))
{
if (data$new_confirmed[i] < 0)
{
data$new_confirmed[i] <- data$new_confirmed[i]*(-1)
}
if (data$new_deaths[i] < 0)
{
data$new_deaths[i] <- data$new_deaths[i]*(-1)
}
}
data %>%
filter(new_confirmed<0) %>%
select(new_confirmed)
## # A tibble: 0 × 1
## # ℹ 1 variable: new_confirmed <dbl>
data %>%
filter(new_deaths<0) %>%
select(new_deaths)
## # A tibble: 0 × 1
## # ℹ 1 variable: new_deaths <dbl>
data_country_all <- data %>%
summarise(period = '28/03/2020 to 27/03/2022',
country = 'Brazil',
estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed)
data_country_all
## # A tibble: 1 × 8
## period country estimated_population new_confirmed new_deaths
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 28/03/2020 to 27/03/2022 Brazil 211755692 29904964 659726
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
For the period from 28/03/2020 to 27/02/2022 there were observed 29,904,964 of new confirmed cases and 659,726 deaths due to Covid19 in Brazil. This resulted in an approximate proportion of 14.1% and 0.3% of the Brazilian population that suffered from a new confirmed case and death, respectively. The proportion of death and new confirmed cases was about 2.2%.
Next the evolution of the Covid19 cases was evaluated. As the number of cases per day varies a lot, there would be a lot of noise in a graph created with this data directly. Therefore, the weekly sum of new confirmed cases and new deaths were added up to create a new dataframe named data_country_week that was used to support this analysis.data_country_week <- data
data_country_week$week <- floor_date(data$date, "week")
data_country_week <- data_country_week %>%
group_by(week) %>%
summarise(estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed) %>%
ungroup()
head(data_country_week)
## # A tibble: 6 × 7
## week estimated_population new_confirmed new_deaths
## <date> <dbl> <dbl> <dbl>
## 1 2020-03-22 211755692 477 22
## 2 2020-03-29 211755692 6428 330
## 3 2020-04-05 211755692 10610 696
## 4 2020-04-12 211755692 16184 1234
## 5 2020-04-19 211755692 22331 1699
## 6 2020-04-26 211755692 38653 2736
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
colors <- c('Death' = 'red', 'Confirmed' = 'blue', 'Death/Confirmed Rate' = 'black')
x_annotation <- date(ymd_hms('2020-01-01 23:12:13', tz = 'America/New_York'))
ggplot(data=data_country_week) +
geom_line(mapping = aes(x=week, y=new_confirmed / 10000, color='Confirmed')) +
geom_line(mapping = aes(x=week, y=new_deaths / 1000, color='Death')) +
annotate(geom='label', x=x_annotation, y=114, label='BRASIL (total)
Period: 28/03/2020 to 27/03/2022
Confirmed Cases: 29,904,964
Deaths: 659,726', color='black', hjust = 0, fontface='bold', size=4) +
labs(x = 'Week', color = 'Legend') +
ggtitle('Evolution of Confirmed and Death Rates in Brazil') +
scale_y_continuous(
'Number of New Confirmed Cases (x 10,000)',
sec.axis = sec_axis(~ . * 1, name = 'Number of New Deaths (x 1,000)')) +
scale_color_manual(values = colors) +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size = 11,face = 'bold'),
axis.text = element_text(size = 10),
axis.title.y.right = element_text(color = 'red'),
axis.title.y.left = element_text(color = 'blue'),
axis.text.y.right = element_text(color = 'red'),
axis.text.y.left = element_text(color = 'blue'),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
legend.position = c(0.1, 0.6),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black', fill = '#F5F5F5'),
panel.border = element_rect(size = 1, fill = NA)
)
As shown above, the peaks for the number of new confirmed cases occurred in summer and spring of 2021 and summer of 2022, which the latter (2022) reached a level of approximately 1,300,000 (130 x 10,000) new confirmed cases but lasted a shorter period compared to the peak of 2021. The number of new deaths reached its peak (about 20,000 new deaths) in spring of 2021, coinciding with the peak of new confirmed cases in 2021. Although the peak of new confirmed case in 2022 was more than the double of the previous peak (2021), the number of new deaths for the year 2022 was dramatically reduced when compared to the number of new deaths in the first peak.
The following graph illustrates the Death/Confirmed Rate of Covid19 in Brazil for the period from 28/03/2020 to 27/03/2022.x_annotation <- date(ymd_hms('2021-10-01 23:12:13', tz = 'America/New_York'))
ggplot(data=data_country_week) +
geom_line(mapping = aes(x=week, y=deaths_confirmed_rate * 100, color='Death/Confirmed Rate')) +
annotate(geom='label', x=x_annotation, y=6.85, label='BRASIL (total)
Period: 03/2020 to 02/2022
Death/Confirmed Rate: 2.2%', color='black', hjust = 0, fontface='bold', size=4) +
labs(x = 'Week', y='Death/Confirmed Rate (%)', color = 'Legend') +
ggtitle('Evlolution of Death/Confirmed Rate in Brazil') +
scale_color_manual(values = colors) +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
legend.position = c(0.89, 0.688),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
This plot confirmed that there was a much lower rate of new deaths and new confirmed cases in the end of the assessed period. Moreover, this plot also showed that even though the peaks of new confirmed and death cases happened in 2021 and 2022, the higher death/confirmed rate was observed right in the beginning of the pandemic, when the death/confirmed rate nearly reached 8%. This probably happened because we were not very aware about the disease, and therefore, not well prepared to fight against it, but we improved a lot over time.
Now, let’s dive a little deeper looking at data aggregated by state. First, a new dataset data_state was created with the totals of new confirmed and deaths cases.data_state <- data %>%
group_by(state) %>%
summarise(estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed) %>%
ungroup()
state <- c('AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP', 'SE', 'TO')
region <- c('N','NE','N','N','NE','NE','CO','SE','CO','NE','CO','CO','SE','N','NE','S','NE','NE','SE','NE','S','N','N','S','SE','NE','N')
region_state_list <- data.frame(region, state)
counter = 1
data_state$region = 0
for (i in 1:length(data_state$state)){
for (j in 1:length(region_state_list$region)){
if (data_state$state[i] == region_state_list$state[j]){
data_state$region[i] = region_state_list$region[j]
counter <- counter + 1
}
}
}
data_state <- data_state %>%
reframe(region, state, estimated_population, new_confirmed, new_deaths, prop_confirmed_population, prop_deaths_population, deaths_confirmed_rate)
head(data_state)
## # A tibble: 6 × 8
## region state estimated_population new_confirmed new_deaths
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 N AC 894470 123817 1994
## 2 NE AL 3351543 295960 6869
## 3 N AM 4207714 580989 14156
## 4 N AP 861773 160325 2122
## 5 NE BA 14930634 1529977 29658
## 6 NE CE 9187103 1269210 26725
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
gg <- ggplot(data_state) +
geom_point(aes(x=prop_confirmed_population * 100,
y=prop_deaths_population * 100,
color=region,
size=deaths_confirmed_rate * 100,
group=state)) +
theme_bw() +
xlab("New Confirmed Case Rate (%)") +
ylab("New Deaths Rate (%)") +
ggtitle("Proportion of New Confirmed Case and Deaths Rates per State") +
labs(color = 'Region') +
guides(size = FALSE) +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA))
ggplotly(gg) %>%
highlight("plotly_hover")
The plot above shows bubbles that represent each state of Brazil coloured by its regions. The rate of new confirmed case is plotted on the x axis and the rate of new deaths on the y axis, while the size of the bubbles represents the death/confirmed rate.
Overall, the regions with the highest new confirmed rate were the south (S) and central west (CO) whereas the highest new deaths rate was observed for south east (SE) and CO. The death/confirmed rate was also highest for SE. Overall, the NE presented the lowest new confirmed case rate and the lowest new deaths rate but, interestingly, the death/confirmed rate was mostly medium level, ranging from 1.64 to 2.56%.
When looking at the states, it is clear that the two big pink bubbles on top of the plot (‘São Paulo’ - SP and ‘Rio de Janeiro’ RJ) where the ones which presented the highest death/confirmed rate (3.19% and 3.49%, respectively), while ‘Santa Catarina’ (SC) presented the lowest (1.29%). The states with the highest new deaths rate were RJ and ‘Mato Grosso’ (MT) with values above 0.4%. With respect to new confirmed cases rate, the highest values were observed for ‘Espírito Santo’ (ES) and ‘Roraima’ (RR) (above 25%), which both of them presented a relatively low death/confirmed rate (< 1.4%).
Let’s create a new dataset named data_state_week to further investigate the evolution of Covid19 per state, although I guess the plot will be very polluted as there were 27 states.data_state_week <- data
data_state_week$week <- floor_date(data$date, "week")
data_state_week <- data_state_week %>%
group_by(state, week) %>%
summarise(estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed) %>%
ungroup()
head(data_state_week)
## # A tibble: 6 × 8
## state week estimated_population new_confirmed new_deaths
## <fct> <date> <dbl> <dbl> <dbl>
## 1 AC 2020-03-22 894470 0 0
## 2 AC 2020-03-29 894470 21 0
## 3 AC 2020-04-05 894470 26 2
## 4 AC 2020-04-12 894470 70 4
## 5 AC 2020-04-19 894470 116 5
## 6 AC 2020-04-26 894470 295 11
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
ggplot(data=data_state_week) +
geom_line(mapping = aes(x=week, y=new_confirmed, group=state ,color=state)) +
labs(x = 'Week', y='Number of New Confirmed Cases') +
ggtitle('Evlolution of New Confirmed Case per State') +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
data_region_week <- data
counter = 1
data_region_week$region = 0
for (i in 1:length(data_region_week$state)){
for (j in 1:length(region_state_list$region)){
if (data_region_week$state[i] == region_state_list$state[j]){
data_region_week$region[i] = region_state_list$region[j]
counter <- counter + 1
}
}
}
data_region_week$week <- floor_date(data$date, "week")
data_region_week <- data_region_week %>%
group_by(region, week) %>%
summarise(estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed) %>%
ungroup()
data_region_week <- data_region_week %>%
reframe(week, region, estimated_population, new_confirmed, new_deaths, prop_confirmed_population, prop_deaths_population, deaths_confirmed_rate)
head(data_region_week)
## # A tibble: 6 × 8
## week region estimated_population new_confirmed new_deaths
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2020-03-22 CO 16504303 29 0
## 2 2020-03-29 CO 16504303 323 10
## 3 2020-04-05 CO 16504303 341 18
## 4 2020-04-12 CO 16504303 484 23
## 5 2020-04-19 CO 16504303 483 16
## 6 2020-04-26 CO 16504303 1040 15
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
ggplot(data=data_region_week) +
geom_line(mapping = aes(x=week, y=prop_confirmed_population * 100, group=region ,color=region)) +
labs(x = 'Week', y='New Confirmed Cases Rate (%)') +
ggtitle('Evlolution of New Confirmed Case per Region') +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
data_region_month <- data
counter = 1
data_region_month$region = 0
for (i in 1:length(data_region_month$state)){
for (j in 1:length(region_state_list$region)){
if (data_region_month$state[i] == region_state_list$state[j]){
data_region_month$region[i] = region_state_list$region[j]
counter <- counter + 1
}
}
}
data_region_month$month <- floor_date(data$date, "month")
data_region_month <- data_region_month %>%
group_by(region, month) %>%
summarise(estimated_population = sum(unique(estimated_population)),
new_confirmed = sum(new_confirmed),
new_deaths = sum(new_deaths)) %>%
mutate(prop_confirmed_population = new_confirmed/estimated_population,
prop_deaths_population = new_deaths/estimated_population,
deaths_confirmed_rate = new_deaths/new_confirmed) %>%
ungroup()
data_region_month <- data_region_month %>%
reframe(month, region, estimated_population, new_confirmed, new_deaths, prop_confirmed_population, prop_deaths_population, deaths_confirmed_rate)
head(data_region_month)
## # A tibble: 6 × 8
## month region estimated_population new_confirmed new_deaths
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2020-03-01 CO 16504303 142 4
## 2 2020-04-01 CO 16504303 2290 74
## 3 2020-05-01 CO 16504303 14798 304
## 4 2020-06-01 CO 16504303 81549 1415
## 5 2020-07-01 CO 16504303 154250 3585
## 6 2020-08-01 CO 16504303 184565 4003
## # ℹ 3 more variables: prop_confirmed_population <dbl>,
## # prop_deaths_population <dbl>, deaths_confirmed_rate <dbl>
ggplot(data=data_region_month) +
geom_line(mapping = aes(x=month, y=prop_confirmed_population * 100, group=region ,color=region)) +
labs(x = 'Month', y='New Confirmed Cases Rate (%)') +
ggtitle('Evlolution of New Confirmed Case per Region per Month') +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
Now, the plot is less polluted and we can visualise the curves for each state much better. It is also possible to confirm what was previously mentioned, as most of the time, the S presented the highest rate of new confirmed cases, followed by the CO, whereas the NE presented the lowest for most of the time. Moreover, the highest peak was observed for the region S (about 3%) and CO (about 2.1%) during the first months of 2022.
In the following step, let’s investigate the new deaths rate per region per monthggplot(data=data_region_month) +
geom_line(mapping = aes(x=month, y=prop_deaths_population * 100, group=region ,color=region)) +
labs(x = 'Month', y='New Deaths Rate (%)') +
ggtitle('Evlolution of New Deaths per Region per Month') +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
For new deaths rate, it is interesting to note that a similar peak was reached in the beginning of the pandemic and in summer-spring time in the north (N) region. All the other regions followed a similar pattern. The NE regions presented again the lowest new deaths rate during the studied period, whereas S, CO and SP were the highest.
In the next step, let’s investigate the death/confirmed rate per region per monthggplot(data=data_region_month) +
geom_line(mapping = aes(x=month, y=deaths_confirmed_rate * 100, group=region ,color=region)) +
labs(x = 'Month', y='Death/Confirmed Rate (%)') +
ggtitle('Evlolution of Deaths/Confirmed Rate per Region per Month') +
theme(
plot.title = element_text(size = 14, hjust = 0.5, face = 'bold'),
axis.title = element_text(size=11,face='bold'),
axis.text = element_text(size=10),
legend.text = element_text(size = 7),
legend.title = element_text(size = 8, face = 'bold'),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(colour = 'black'),
panel.border = element_rect(size = 1, fill = NA)
)
In this last plot, it is clear that the SE presented the highest death/confirmed rate nearly for the entire period studied. It is also interesting to note that the regions S and CO had their highest peak for death/confirmed rate in summer-spring of 2021 and not at the beginning of the pandemic such as all other states as well as the Brazilian rate.
Finally, as there were data for the entire year of 2021, the new deaths rate due to Covid19 was calculate to compare its severity with other common causes of death in Brazil.data %>%
filter(date >= date(ymd('2021-01-01')), date <= date(ymd('2021-12-31'))) %>%
summarise(total_confirmed = sum(new_confirmed),
total_death = sum(new_deaths),
death_rate = sum(new_deaths) / sum(new_confirmed) * 100)
## # A tibble: 1 × 3
## total_confirmed total_death death_rate
## <dbl> <dbl> <dbl>
## 1 14649109 424629 2.90